Learning in non-stationary Partially Observable Markov Decision Processes
نویسندگان
چکیده
We study the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not perfectly known and may change over time. We present the algorithm MEDUSA+, which incrementally improves a POMDP model using selected queries, while still optimizing the reward. Empirical results show the response of the algorithm to changes in the parameters of a model: the changes are learned quickly and the agent still accumulates high reward throughout the process.
منابع مشابه
Learning Stationary Temporal Probabilistic Networks
The paper describes a method for learning representations of partially observable Markov decision processes in the form of temporal probabilistic networks, which can subsequently be used by robotic agents for action planning and policy determination. A solution is provided to the problem of enforcing stationarity of the learned Markov model. Several preliminary experiments are described that co...
متن کاملSolving Hidden-Semi-Markov-Mode Markov Decision Problems
Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Processes (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chai...
متن کاملGeometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes
It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state,...
متن کاملHidden-Mode Markov Decision Processes
Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang [email protected] [email protected] [email protected] Department of Computer Science, Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong Abstract Traditional reinforcement learning (RL) assumes that environment dynamics do not change over time (i.e., stationary). This assumption, however, is not realistic in many real-...
متن کاملGood Policies for Partially-observable Markov Decision Processes Are Hard to Nd
Optimal policy computation in nite-horizon Markov decision processes is a classical problem in optimization with lots of pratical applications. For stationary policies and innnite horizon it is known to be solvable in polynomial time by linear programming, whereas for nite-horizon it is a longstanding open problem. We consider this problem for a slightly generalized model, namely partially-obse...
متن کامل